Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 14.158
Filtrar
1.
Sci Rep ; 14(1): 8159, 2024 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-38589623

RESUMEN

Whole-genome sequencing (WGS) is currently making its transition from research tool into routine (clinical) diagnostic practice. The workflow for WGS includes the highly labor-intensive library preparations (LP), one of the most critical steps in the WGS procedure. Here, we describe the automation of the LP on the flowbot ONE robot to minimize the risk of human error and reduce hands-on time (HOT). For this, the robot was equipped, programmed, and optimized to perform the Illumina DNA Prep automatically. Results obtained from 16 LP that were performed both manually and automatically showed comparable library DNA yields (median of 1.5-fold difference), similar assembly quality values, and 100% concordance on the final core genome multilocus sequence typing results. In addition, reproducibility of results was confirmed by re-processing eight of the 16 LPs using the automated workflow. With the automated workflow, the HOT was reduced to 25 min compared to the 125 min needed when performing eight LPs using the manual workflow. The turn-around time was 170 and 200 min for the automated and manual workflow, respectively. In summary, the automated workflow on the flowbot ONE generates consistent results in terms of reliability and reproducibility, while significantly reducing HOT as compared to manual LP.


Asunto(s)
Lipopolisacáridos , Robótica , Humanos , Reproducibilidad de los Resultados , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biblioteca de Genes , Secuenciación Completa del Genoma , ADN , Flujo de Trabajo
2.
Genome Biol ; 25(1): 91, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38589937

RESUMEN

BACKGROUND: Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS: Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS: Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.


Asunto(s)
Algoritmos , Benchmarking , Humanos , Genotipo , Genómica/métodos , Técnicas de Genotipaje/métodos , Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
3.
Genome Biol ; 25(1): 101, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38641647

RESUMEN

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.


Asunto(s)
Genoma , Genómica , Genómica/métodos , Biología Computacional , Mutación INDEL , Sesgo , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
4.
Pediatr Int ; 66(1): e15760, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38641939

RESUMEN

Diseases are caused by genetic and/or environmental factors. It is important to understand the pathomechanism of monogenic diseases that are caused only by genetic factors, especially prenatal- or childhood-onset diseases for pediatricians. Identifying "novel" disease genes and elucidating how genomic changes lead to human phenotypes would develop new therapeutic approaches for rare diseases for which no fundamental cure has yet been established. Genomic analysis has evolved along with the development of analytical techniques, from Sanger sequencing (first-generation sequencing) to techniques such as comparative genomic hybridization, massive parallel short-read sequencing (using a next-generation sequencer or second-generation sequencer) and long-read sequencing (using a next-next generation sequencer or third-generation sequencer). I have been researching human genetics using conventional and new technologies, together with my mentors and numerous collaborators, and have identified genes responsible for more than 60 diseases. Here, an overview of genomic analyses of monogenic diseases that aims to identify novel disease genes, and several examples using different approaches depending on the disease characteristics are presented.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Niño , Hibridación Genómica Comparativa , Fenotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
5.
Sci Rep ; 14(1): 7988, 2024 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-38580715

RESUMEN

In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.


Asunto(s)
Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ADN/genética , Genoma Humano
6.
Biochemistry (Mosc) ; 89(Suppl 1): S234-S248, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38621753

RESUMEN

This review highlights operational principles, features, and modern aspects of the development of third-generation sequencing technology of biopolymers focusing on the nucleic acids analysis, namely the nanopore sequencing system. Basics of the method and technical solutions used for its realization are considered, from the first works showing the possibility of creation of these systems to the easy-to-handle procedure developed by Oxford Nanopore Technologies company. Moreover, this review focuses on applications, which were developed and realized using equipment developed by the Oxford Nanopore Technologies, including assembly of whole genomes, methagenomics, direct analysis of the presence of modified bases.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Análisis de Secuencia de ADN/métodos , Biopolímeros , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
7.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38565260

RESUMEN

MOTIVATION: Automated chromatin segmentation based on ChIP-seq (chromatin immunoprecipitation followed by sequencing) data reveals insights into the epigenetic regulation of chromatin accessibility. Existing segmentation methods are constrained by simplifying modeling assumptions, which may have a negative impact on the segmentation quality. RESULTS: We introduce EpiSegMix, a novel segmentation method based on a hidden Markov model with flexible read count distribution types and state duration modeling, allowing for a more flexible modeling of both histone signals and segment lengths. In a comparison with existing tools, ChromHMM, Segway, and EpiCSeg, we show that EpiSegMix is more predictive of cell biology, such as gene expression. Its flexible framework enables it to fit an accurate probabilistic model, which has the potential to increase the biological interpretability of chromatin states. AVAILABILITY AND IMPLEMENTATION: Source code: https://gitlab.com/rahmannlab/episegmix.


Asunto(s)
Cromatina , Epigénesis Genética , Análisis de Secuencia de ADN/métodos , Histonas/metabolismo , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
8.
Genome Biol ; 25(1): 90, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38589969

RESUMEN

Single-cell ATAC-seq has emerged as a powerful approach for revealing candidate cis-regulatory elements genome-wide at cell-type resolution. However, current single-cell methods suffer from limited throughput and high costs. Here, we present a novel technique called scifi-ATAC-seq, single-cell combinatorial fluidic indexing ATAC-sequencing, which combines a barcoded Tn5 pre-indexing step with droplet-based single-cell ATAC-seq using the 10X Genomics platform. With scifi-ATAC-seq, up to 200,000 nuclei across multiple samples can be indexed in a single emulsion reaction, representing an approximately 20-fold increase in throughput compared to the standard 10X Genomics workflow.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Cromatina , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Núcleo Celular
9.
BMC Pediatr ; 24(1): 230, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38561707

RESUMEN

BACKGROUND: Newborn screening (NBS), such as tandem mass spectrometry (MS/MS), may yield false positive/negative results. Next-generation sequencing (NGS) has the potential to provide increased data output, efficiencies, and applications. This study aimed to analyze the types and distribution of pathogenic gene mutations in newborns in Huzhou, Zhejiang province, China and explore the applicability of NGS and MS/MS in NBS. METHODS: Blood spot samples from 1263 newborns were collected. NGS was employed to screen for pathogenic variants in 542 disease-causing genes, and detected variants were validated using Sanger sequencing. Simultaneously, 26 inherited metabolic diseases (IMD) were screened using MS/MS. Positive or suspicious samples identified through MS/MS were cross-referenced with the results of NGS. RESULTS: Among all newborns, 328 had no gene mutations detected. NGS revealed at least one gene mutation in 935 newborns, with a mutation rate of 74.0%. The top 5 genes were FLG, GJB2, UGT1A1, USH2A, and DUOX2. According to American College of Medical Genetics guidelines, gene mutations in 260 cases were classified as pathogenic or likely pathogenic mutation, with a positive rate of 20.6%. The top 5 genes were UGT1A1, FLG, GJB2, MEFV, and G6PD. MS/MS identified 18 positive or suspicious samples for IMD and 1245 negative samples. Verification of these cases by NGS results showed no pathogenic mutations, resulting in a false positive rate of 1.4% (18/1263). CONCLUSION: NBS using NGS technology broadened the range of diseases screened, and enhanced the accuracy of diagnoses in comparison to MS/MS for screening IMD. Combining NGS and biochemical screening would improve the efficiency of current NBS.


Asunto(s)
Enfermedades Metabólicas , Tamizaje Neonatal , Recién Nacido , Humanos , Tamizaje Neonatal/métodos , Espectrometría de Masas en Tándem , Enfermedades Metabólicas/diagnóstico , Mutación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Pirina/genética
10.
Cancer Med ; 13(7): e7162, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38572952

RESUMEN

PURPOSE: Genetic mutation detection has become an important step in nonsmall-cell lung cancer (NSCLC) treatment because of the increasing number of drugs that target genomic rearrangements. A multiplex test that can detect multiple gene mutations prior to treatment is thus necessary. Currently, either next-generation sequencing (NGS)-based or polymerase chain reaction (PCR)-based tests are used. We evaluated the performance of the Oncomine Dx Target Test (ODxTT), an NGS-based multiplex biomarker panel test, and the AmoyDx Pan Lung Cancer PCR Panel (AmoyDx PLC panel), a real-time PCR-based multiplex biomarker panel test. MATERIALS AND METHODS: Patients with histologically diagnosed NSCLC and a sufficient sample volume to simultaneously perform the AmoyDx PLC panel and ODxTT-M were included in the study. The success and detection rates of both tests were evaluated. RESULTS: Biopsies revealed 116 cases of malignancies, 100 of which were NSCLC. Of these, 59 met the inclusion criteria and were eligible for analysis. The success rates were 100% and 98% for AmoyDx PLC panel and ODxTT-M, respectively. Nine driver mutations were detected in 35.9% and 37.3% of AmoyDx PLC and ODxTT-M panels, respectively. EGFR mutations were detected in 14% and 12% of samples using the AmoyDx PLC panel and ODxTT-M, respectively. Of the 58 cases in which both NGS and AmoyDx PLC panels were successful, discordant results were observed in seven cases. These differences were mainly due to different sensitivities of the detection methods used and the gene variants targeted in each test. DISCUSSION: The AmoyDx PLC panel, a PCR-based multiplex diagnostic test, exhibits a high success rate. The frequency of the nine genes targeted for treatment detected by the AmoyDx PLC panel was comparable to the frequency of mutations detected by ODxTT-M. Clinicians should understand and use the AmoyDx PLC panel and ODxTT-M with respect to their respective performances and limitations.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/tratamiento farmacológico , Reacción en Cadena de la Polimerasa Multiplex , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Mutación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biomarcadores
11.
Microb Genom ; 10(4)2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38578268

RESUMEN

Background. PCR amplification is a necessary step in many next-generation sequencing (NGS) library preparation methods [1, 2]. Whilst many PCR enzymes are developed to amplify single targets efficiently, accurately and with specificity, few are developed to meet the challenges imposed by NGS PCR, namely unbiased amplification of a wide range of different sizes and GC content. As a result PCR amplification during NGS library prep often results in bias toward GC neutral and smaller fragments. As NGS has matured, optimized NGS library prep kits and polymerase formulations have emerged and in this study we have tested a wide selection of available enzymes for both short-read Illumina library preparation and long fragment amplification ahead of long-read sequencing.We tested over 20 different hi-fidelity PCR enzymes/NGS amplification mixes on a range of Illumina library templates of varying GC content and composition, and find that both yield and genome coverage uniformity characteristics of the commercially available enzymes varied dramatically. Three enzymes Quantabio RepliQa Hifi Toughmix, Watchmaker Library Amplification Hot Start Master Mix (2X) 'Equinox' and Takara Ex Premier were found to give a consistent performance, over all genomes, that mirrored closely that observed for PCR-free datasets. We also test a range of enzymes for long-read sequencing by amplifying size fractionated S. cerevisiae DNA of average size 21.6 and 13.4 kb, respectively.The enzymes of choice for short-read (Illumina) library fragment amplification are Quantabio RepliQa Hifi Toughmix, Watchmaker Library Amplification Hot Start Master Mix (2X) 'Equinox' and Takara Ex Premier, with RepliQa also being the best performing enzyme from the enzymes tested for long fragment amplification prior to long-read sequencing.


Asunto(s)
ADN , Saccharomyces cerevisiae , Reacción en Cadena de la Polimerasa/métodos , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
12.
JCO Precis Oncol ; 8: e2300567, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38579192

RESUMEN

PURPOSE: There are limited data available on the real-world patterns of molecular testing in men with advanced prostate cancer. We thus sought to evaluate next-generation sequencing (NGS) testing in the United States, focused on single versus serial NGS testing, the different disease states of testing (hormone-sensitive v castration-resistant, metastatic vs nonmetastatic), tissue versus plasma circulating tumor DNA (ctDNA) assays, and how often actionable data were found on each NGS test. METHODS: The Prostate Cancer Precision Medicine Multi-Institutional Collaborative Effort clinical-genomic database was used for this retrospective analysis, including 1,597 patients across 15 institutions. Actionable NGS data were defined as including somatic alterations in homologous recombination repair genes, mismatch repair deficiency, microsatellite instability (MSI-high), or a high tumor mutational burden ≥10 mut/MB. RESULTS: Serial NGS testing (two or more NGS tests with specimens collected more than 60 days apart) was performed in 9% (n = 144) of patients with a median of 182 days in between test results. For the second NGS test and beyond, 82.1% (225 of 274) of tests were from ctDNA assays and 76.1% (217 of 285) were collected in the metastatic castration-resistant setting. New actionable data were found on 11.1% (16 of 144) of second NGS tests, with 3.5% (5 of 144) of tests detecting a new BRCA2 alteration or MSI-high. A targeted therapy (poly (ADP-ribose) polymerase inhibitor or immunotherapy) was given after an actionable result on the second NGS test in 31.3% (5 of 16) of patients. CONCLUSION: Repeat somatic NGS testing in men with prostate cancer is infrequently performed in practice and can identify new actionable alterations not present with initial testing, suggesting the utility of repeat molecular profiling with tissue or blood of men with metastatic castration-resistant prostate cancer to guide therapy choices.


Asunto(s)
Antineoplásicos , ADN Tumoral Circulante , Neoplasias de la Próstata , Masculino , Humanos , Estudios Retrospectivos , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/tratamiento farmacológico , ADN Tumoral Circulante/genética , Antineoplásicos/uso terapéutico , Inhibidores de Poli(ADP-Ribosa) Polimerasas/uso terapéutico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
13.
Nat Commun ; 15(1): 2964, 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38580638

RESUMEN

The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.


Asunto(s)
Diploidia , Nanoporos , Alelos , Haplotipos , Heterocigoto , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
14.
Biomed Environ Sci ; 37(3): 294-302, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38582993

RESUMEN

Objective: Viral encephalitis is an infectious disease severely affecting human health. It is caused by a wide variety of viral pathogens, including herpes viruses, flaviviruses, enteroviruses, and other viruses. The laboratory diagnosis of viral encephalitis is a worldwide challenge. Recently, high-throughput sequencing technology has provided new tools for diagnosing central nervous system infections. Thus, In this study, we established a multipathogen detection platform for viral encephalitis based on amplicon sequencing. Methods: We designed nine pairs of specific polymerase chain reaction (PCR) primers for the 12 viruses by reviewing the relevant literature. The detection ability of the primers was verified by software simulation and the detection of known positive samples. Amplicon sequencing was used to validate the samples, and consistency was compared with Sanger sequencing. Results: The results showed that the target sequences of various pathogens were obtained at a coverage depth level greater than 20×, and the sequence lengths were consistent with the sizes of the predicted amplicons. The sequences were verified using the National Center for Biotechnology Information BLAST, and all results were consistent with the results of Sanger sequencing. Conclusion: Amplicon-based high-throughput sequencing technology is feasible as a supplementary method for the pathogenic detection of viral encephalitis. It is also a useful tool for the high-volume screening of clinical samples.


Asunto(s)
Encefalitis Viral , Virus , Humanos , Encefalitis Viral/diagnóstico , Virus/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reacción en Cadena de la Polimerasa , ADN Viral
15.
Zhonghua Zhong Liu Za Zhi ; 46(4): 274-284, 2024 Apr 23.
Artículo en Chino | MEDLINE | ID: mdl-38644265

RESUMEN

In hospital laboratories-developed testing is of great significance for the clinical testing products that has not been approved by the National Medical Product Administration and is urgently needed to meet clinical practice needs. With the development of cancer precision medicine in recent years, comprehensive genomic profiling (CGP) has become an important means and method for the detection of drug targets, precise molecular typing, and immunotherapy biomarkers in cancer patients. However, there is still a lack of unified understanding and consensus on clinical testing standards and application specifications for laboratory-developed testing in the hospitals. The Molecular Pathology Collaboration Group of the Cancer Experts Committee of the Chinese Anti-Cancer Association and the Molecular Pathology Group of the Pathology Branch of the Chinese Medical Association initiated the expert consensus on relevant specifications for analytical validation of CGP next-generation sequencing (NGS) testing in Chinese hospitals. Combined with domestic clinical practice, refer to domestic and foreign literatures, from the background of the laboratory-developed testing, analytical validation scenarios, evaluation indicators and variation ranges, sample types and quantities covered by analytical validation, clinical performance and drug efficacy determination, and site personnel for analytical validation, quality control, inter-laboratory quality evaluation and document management, etc. After the discussion by the expert group, 12 expert consensuses were formed to provide reference for the analytical validation and clinical application of tumor CGP NGS testing in Chinese hospitals, so as to promote the laboratory-developed testing applications in Chinese hospitals.


Asunto(s)
Consenso , Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/genética , China , Genómica/métodos , Medicina de Precisión/métodos , Control de Calidad
16.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38603604

RESUMEN

MOTIVATION: Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS: We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION: Hop is available at https://github.com/oligogenic/HOP.


Asunto(s)
Exoma , Humanos , Secuenciación del Exoma/métodos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Biología Computacional/métodos
17.
Nat Commun ; 15(1): 3126, 2024 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-38605047

RESUMEN

Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time.


Asunto(s)
Algoritmos , Polimorfismo de Nucleótido Simple , Humanos , Haplotipos/genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
18.
BMC Genomics ; 25(1): 365, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622536

RESUMEN

BACKGROUND: Microbial genomes are largely comprised of protein coding sequences, yet some genomes contain many pseudogenes caused by frameshifts or internal stop codons. These pseudogenes are believed to result from gene degradation during evolution but could also be technical artifacts of genome sequencing or assembly. RESULTS: Using a combination of observational and experimental data, we show that many putative pseudogenes are attributable to errors that are incorporated into genomes during assembly. Within 126,564 publicly available genomes, we observed that nearly identical genomes often substantially differed in pseudogene counts. Causal inference implicated assembler, sequencing platform, and coverage as likely causative factors. Reassembly of genomes from raw reads confirmed that each variable affects the number of putative pseudogenes in an assembly. Furthermore, simulated sequencing reads corroborated our observations that the quality and quantity of raw data can significantly impact the number of pseudogenes in an assembler dependent fashion. The number of unexpected pseudogenes due to internal stops was highly correlated (R2 = 0.96) with average nucleotide identity to the ground truth genome, implying relative pseudogene counts can be used as a proxy for overall assembly correctness. Applying our method to assemblies in RefSeq resulted in rejection of 3.6% of assemblies due to significantly elevated pseudogene counts. Reassembly from real reads obtained from high coverage genomes showed considerable variability in spurious pseudogenes beyond that observed with simulated reads, reinforcing the finding that high coverage is necessary to mitigate assembly errors. CONCLUSIONS: Collectively, these results demonstrate that many pseudogenes in microbial genome assemblies are actually genes. Our results suggest that high read coverage is required for correct assembly and indicate an inflated number of pseudogenes due to internal stops is indicative of poor overall assembly quality.


Asunto(s)
Genoma Bacteriano , Seudogenes , Seudogenes/genética , Mapeo Cromosómico , Secuencia de Bases , Genoma Microbiano , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
19.
Virol J ; 21(1): 86, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622686

RESUMEN

BACKGROUND: Viruses have notable effects on agroecosystems, wherein they can adversely affect plant health and cause problems (e.g., increased biosecurity risks and economic losses). However, our knowledge of their diversity and interactions with specific host plants in ecosystems remains limited. To enhance our understanding of the roles that viruses play in agroecosystems, comprehensive analyses of the viromes of a wide range of plants are essential. High-throughput sequencing (HTS) techniques are useful for conducting impartial and unbiased investigations of plant viromes, ultimately forming a basis for generating further biological and ecological insights. This study was conducted to thoroughly characterize the viral community dynamics in individual plants. RESULTS: An HTS-based virome analysis in conjunction with proximity sampling and a tripartite network analysis were performed to investigate the viral diversity in chunkung (Cnidium officinale) plants. We identified 61 distinct chunkung plant-associated viruses (27 DNA and 34 RNA viruses) from 21 known genera and 6 unclassified genera in 14 known viral families. Notably, 12 persistent viruses (7 DNA and 5 RNA viruses) were exclusive to dwarfed chunkung plants. The detection of viruses from the families Partitiviridae, Picobirnaviridae, and Spinareoviridae only in the dwarfed plants suggested that they may contribute to the observed dwarfism. The co-infection of chunkung by multiple viruses is indicative of a dynamic and interactive viral ecosystem with significant sequence variability and evidence of recombination. CONCLUSIONS: We revealed the viral community involved in chunkung. Our findings suggest that chunkung serves as a significant reservoir for a variety of plant viruses. Moreover, the co-infection rate of individual plants was unexpectedly high. Future research will need to elucidate the mechanisms enabling several dozen viruses to co-exist in chunkung. Nevertheless, the important insights into the chunkung virome generated in this study may be relevant to developing effective plant viral disease management and control strategies.


Asunto(s)
Coinfección , Enanismo , Virus de Plantas , Virus ARN , Humanos , Viroma , Ecosistema , Cnidium/genética , ARN Viral/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Virus de Plantas/genética , ADN , Filogenia
20.
Gigascience ; 132024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38626722

RESUMEN

BACKGROUND: Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS: We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS: Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.


Asunto(s)
Genoma , Cromosomas Sexuales , Cromosomas Sexuales/genética , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...